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1 . Intvoduation 

The  book  by  Forsythe  and  Moler  [1967]  gives  an  error  analysis 
for  Gaussian  elimination  dravm  from  the  work  of  Wilkinson  and  an  analysis  for 
Iterative  Improvement  due  to  Moler.  In  this  paper  we  do  a careful  backward 
error  analysis  using  a different  idea  of  what  it  means  for  a perturbation 
to  be  small,  namely  that  each  datum  is  subject  to  a small  relative  change. 

A system  of  n equations 
n 


Z 

j=l 


a . . 


1 < i < n. 


in  n unknowns  x. , 1 < i < n,  is  often  written  in  matrix  notation  as 

Ax  = b. 

The  solution  x computed  in  floating  point  arithmetic  is  not  generally 
exactly  equal  to  x,  but  it  still  may  be  acceptable  if  it  satisfies  one  of 
two  criteria: 

(1)  It  fulfills  the  accuracy  requirements  of  the  problem  poser. 
This  usually  involves  some  measure  of  how  far  x is  from  x. 
For  example,  in  determining  the  coefficients  of  an  inter- 
polating polynomial  it  is  some  norm  of  the  residual 
r = A(x  - x)  which  is  of  concern.  More  often  though  it 
is  some  norm  of  the  error  x - x which  is  of  concern. 

(ii)  It  is  the  exact  solution  of  a problem  which  differs  from 

the  given  problem  by  less  than  the  uncertainty  in  the  data, 
so  that  it  is  conceivable  that  x exactly  solves  the  original 
problem.  Uncertainty  in  the  data  is  generally  present  if 
only  because  of  the  roundoff  error  introduced  when  the 
numbers  are  put  into  the  computer. 


It  is  this  second  criterion  which  motivates  backward  error  analysis. 


ill  conditioning  found  in  Hamming's  [1971]  book  Introduction  to  Applied 
Numerical  Analysis.  The  condition  of  a system  is  the  sensitivity  of  the 


solution  to  uncertainty  in  the  data,  and  this  sensitivity  is  often  measured 


by  a condition  number.  The  usual  definition  of  the  condition  number  is 
based  on  the  idea  that  the  uncertainties  in  the  data  are  roughly  the  same 
size  in  an  absolute  sense.  However,  if  we  suppose  that  these  uncertainties 


have  roughly  the  same  relative  size,  then  we  obtain 


as  the  condition  number  of  a linear  system  where  ||o||  is  the  max  norm. 

There  are  at  least  a couple  of  reasons  for  believing  that  this  second 
approach  is  more  realistic.  First,  most  numerical  computations  are  done 
in  floating  point  rather  than  fixed  point  arithmetic,  and  for  floating 
point  computation  the  conversion  of  data  to  machine  represented  to 
numbers  results  in  errors  of  the  same  relative  size.  Second,  measurement 
errors  are  usually  more  nearly  the  same  in  relative  size  than  in  absolute  size. 

In  §3  we  pose  the  question:  What  is  the  least  amount  by  which 

A and  b must  be  perturbed  so  that  x exactly  solves  the  perturbed  problem? 

The  answer  to  this  question  turns  out  to  be 


I X 


which  we  call  the  "backward  error."  If  this  is  less  than  the  unit  roundoff 


error  u,  then  the  second  acceptability  criterion  is  satisfied.  And  if  it 


can  be  shown  that  an  algorithm  produces  a backward  error  that  is  always 
bounded  by  some  fixed  multiple  K(n)u  of  the  unit  roundoff  error,  then  by 
increasing  the  precision  of  the  intermediate  results  by  a factor  K(n),  the 


1 


3 


second  acceptability  criterion  can  be  met.  An  algorithm  with  this  desirable 
property  is  said  to  be  stable.  If  an  algorithm  is  only  stable  for 
infinitesimal  values  of  u,  it  is  said  to  be  asymptotically  stable. 

The  stability  of  Gaussian  elimination  with  row  pivoting  (usually 
called  partial  pivoting)  is  examined  in  §4.  By  using  an  example  of 
Hamming's  it  is  shown  that  row  pivoting  is  not  asymptotically  stable  even 
for  systems  with  equilibrated  matrices.  Then  by  means  of  a careful  error 
analysis  performed  in  Appendix  A a bound  on  the  backward  error  is  obtained 
which  contains  the  quantity 

"•f  (|D~^A||x|). 

(|D“^A||i|)^ 

where  is  the  matrix  of  row  scaling  factors.  This  quantity  is  minimized 


by  choosing 


= dlag( |A| |x| ) , 


which  calls  for  the  i-tb  row  to  be  divided  by  a.,  x,  + a.„  x„  +. . .+ 

' il  1'  ' i2  2' 

la.  X |.  It  is  shown  that  with  such  a choice  for  D,  row  pivoting  would  be 
' in  n ' 1 ° 

stable.  Of  course  this  is  impractical,  which  explains  Stewart's  [1973,  p.  158] 
observation  that  "In  spite  of  intensive  theoretical  investigation,  there  is 
no  satisfactory  algorithm  for  scaling  a general  matrix."  Nonetheless,  the 


"f  (|A||i|). 

is  an  excellent  a posteriori  measure  of  how  poorly  scaled  the  system  is. 

Sometimes,  programming  considerations  (Sherman  [1976])  call  for 
the  use  of  column  pivoting  Instead  of  row  pivoting,  where  by  column 
pivoting  we  mean  that  columns  are  interchanged  so  that  each  pivot  is  the 
largest  in  its  row.  In  §5  it  is  shown  that  column  pivoting  could  be 


made  stable  if  it  were  somehow  possible  to  scale  the  columns  with 


the  matrix  of  scale  factors 


D2  = diag(|x|). 

This  calls  for  each  column  to  be  multiplied  by  its  corresponding  computed 
solution  value.  A measure  of  ill  scaling  is  given  by 

( lAje)  ||x|| 
max  ' ' II  II 

i (|A| |x|)^ 

where  e is  the  vector  of  all  ones. 

Row  pivoting  may  be  regarded  as  the  generalization  of  complete 
pivoting  in  which  the  ordering  of  the  columns  is  arbitrary,  and  similarly 
column  pivoting  as  the  generalization  in  which  the  ordering  of  the  rows  is 
arbitrary.  From  this  observation  it  follows  that  the  results  of  both  §4 
and  §5  apply  to  complete  pivoting.  However,  one  suspects  that  the  error 
of  complete  pivoting  satisfies  an  error  bound  which  is  appreciably  better 
than  simply  the  smaller  of  the  bounds  for  row  and  column  pivoting. 

In  §6  iterative  Improvement  is  examined  in  the  hope  that  the  poor 
stability  properties  of  Gaussian  elimination  can  be  corrected.  It  is  shown 
that  a single  iteration  of  iterative  improvement  performed  in  single 
precision  is  enough  to  make  Gaussian  elimination  asymptotically  stable. 

Before  proceeding,  it  might  be  interesting  to  demonstrate  the 
instability  of  complete  pivoting  with  a simple  2x2  system  of  equations. 
Consider  Ax  = b where 


r ^ 

3 

1 

A = 

and  b = 

_-i 

0_ 

_0_ 

The  coefficient  matrix  A is  equilibrated  according  to  the  definition  of 
Forsythe  and  Moler  [1967,  p.  45].  Using  rounded  t digit  decimal  (floating 
point)  arithmetic  the  elimination  step  yields  A*x  = b'  where 
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3 3 

1 

A'  = 

and  b ' = 

_0  .99- ••9_ 

.33- • -3_ 

and  so  the  computed  solution 


X 


.33---3  X 10 

.33---3 


The  backward  error  is  determined  by  considering  perturbed  problems  of 
the  form 


3(1  + 6^^) 

3(1  + 6^2> 

^1~ 

-1  + 6^3- 

- (1  + 62j^) 

0 

0 

and  choosing  the  relative  changes 
In  this  case,  62^^  must  be  chosen 


6..  so  as  the  minimize  the  maximum  6.J, 
ij  ' ij' 

to  be  -1,  and  so  the  backward  error  is 


100%  regardless  of  the  precision  t. 


- : . ■ - -■  'jr,-. 


2.  Condition  of  Linear  Systems 

The  condition  of  a problem  is  the  sensitivity  of  its  solution  to  uncertainties 
in  the  problem  data.  The  importance  of  this  concept  is  that  it  indicates  the 
amount  of  accuracy  that  one  should  reasonably  expect  for  the  solution  of  a 
problem  with  inexact  data.  And  even  for  problems  with  exact  data,  the 
conversion  of  the  numbers  to  the  computer's  floating  point  number  base  usually 
introduces  errors. 

As  the  measure  of  the  condition  of  a problem  we  take  the  maximum 
amount  by  which  an  Infinitesimal  perturbation  in  the  problem  data  can  be 
amplified  in  the  solution.  More  precisely  if  C denotes  the  given  problem 
data  and  (pCC)  denotes  the  solution  of  a problem  with  data  then  we  define 
the  ao'^  ' '>n  number  to  be 


11m  relative  distance  from  j>(g)  to  4i(^) 
relative  distance  from  C to  C 


(2.1) 


In  the  case  where  ^ and  <j)(C)  are  scalars,  the  condition  number  is  the  absolute 
value  of  the  relative  derivative,  namely 

(cf.  Bauer  [1974]).  For  linear  algebraic  systems  Ax  = b we  have  ^ = (A,  b) 

and  (()(C)  = A ^ b.  (In  roundoff  analysis  the  number  of  equations  n is  not 

considered  to  be  part  of  the  problem  data;  rather  we  take  the  point  of  view 

that  each  value  of  n defines  a separate  class  of  problems.)  There  are  two  crucial 

matters  that  have  to  be  settled:  (i)  how  to  define  relative  distance  in 

the  problem  space,  and  (11)  how  to  define  relative  distance  in  the 

solution  space. 


Any  problem  which  is  to  be  solved  in  an  approximate  sense  is 


incomplete  unless  there  is  also  some  "metric"  Specified  for  measuring  how 


good  the  approximation  is.  It  is  this  metric  that  should  be  used  in  defining 
the  "relative  distance  from  (t)(C)  to  <KC)*"  Often  this  metric  measures  how 
close  the  approximate  solution  is  to  the  true  solution;  in  other  cases  it 
measures  how  well  the  approximate  solution  satisfies  the  problem.  In  this 
section  we  choose  to  consider  the  problem  of  approximating  A ^ b rather  than 
that  of  approximately  solving  Ax  = b.  The  ratio 

X - X 

IW 

where  11*11  denotes  the  max  norm  is  an  adequate  measure  of  relative  distance 
for  most  purposes  if  the  unknowns  are  appropriately  scaled. 

The  question  of  how  to  measure  the  "relative  distance  from  C to  C" 
is  more  difficult  to  answer  because  a completely  specified  approximation 
problem  need  not  Include  a metric  for  the  problem  space.  However,  there  is 
one  metric  which  is  always  safe  to  use,  namely  the  componentwise  relative 
error 


If  the  value  of  this  quantity  is  small,  then  5 is  close  to  ? by  any  reasonable 
standard,  especially  in  view  of  the  fact  that  putting  data  into  the  computer 
results  in  small  componentwise  errors.  This  metric  has  another  advantage  in 
that  it  is  always  meaningful  regardless  of  the  physical  dimensions  of  the 
problem  data  and  thus  it  is  independent  of  possibly  arbitrary  choice  of  units 
for  the  data.  For  these  reasons  we  take  as  our  measure  of  relative  distance 
the  smallest  £ ^ 0 such  that 

I £ e|a^^ I and  |b^  - b^ | < e |b^ 1 . 

This  seems  to  be  consistent  with  the  ideas  expressed  in  Hamming's  [1971]  book 
Introduction  to  Applied  Numerical  Analysis.  On  page  117  it  is  stated  that 
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"The  term  'ill-conditioned'  is  ill  defined.  The  vague  idea 
is  that  small  changes  in  the  initial  system  can  produce  large 
changes  in  the  final  result.  If  we  are  to  take  floating 
point  seriously,  then  we  should  say  'relatively  small  changes' 
and  'relatively  large  changes'." 

and  on  page  122  it  is  stated  that 

"...the  system  is  indeed  ill  conditioned  because,  no  matter 
how  we  try,  we  are  unable  to  solve  the  system  so  that  the 
answer  is  not  sensitive  to  small  changes  in  the  original 
coefficients. " 

Thus  it  seems  that  by  "relatively  small  changes  in  the  initial  system" 

Hamming  means  "relatively  small  changes  in  the  aoeffioients  of  the  initial 
system."  A similar  thought  is  expressed  by  Kahan  [1966,  p.  795].  It  is 
worth  mentioning  that  this  approach  has  the  advantage  of  forcing  the  perturbed 
matrix  A to  have  the  same  sparsity  structure  as  the  original  matrix  A,  making 
it  more  plausible  to  regard  A as  the  result  of  perturbing  the  original  physical 
problem. 


Having  chosen  our  metrics,  we  are  in  a position  to  determine  the 
condition  number  of  a system  Ax  = b . We  begin  by  obtaining  bounds  on  the 
uncertainty  in  the  solution  due  to  the  uncertainty  in  A and  b.  Bounds  of 
this  type  also  appear  in  Bauer  [1966].  Our  notation  uses  inequalities 
between  arrays  to  mean  inequality  of  the  corresponding  components.  The 
absolute  value  of  an  array  is  also  to  be  understood  in  a componentwise  sense. 

THEOREM  2.1.  Let  Ax  = b and  (A  + 5A) (x  + 6x)  = b + 6b  where 
I 6 A I £ e I A 1 and  1 6b  | _<  e | b | . Then 

JM,  ^ II  |A-h|A||x|  t lA'hlbl  II 
(1  -elllA-'llAllWIxll 

provided  that  the  denominator  is  positive. 

PROOF.  We  have  that 

6x  = A ^6A(x  + 6x)  + A ^6b  , 


(2.2) 
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and  so 

|6x]  £ |a  ^|16A|(1x|  + !6x|)  + |a  ^l|6b| 

£e  |a^||a|(1x1  + |6x|)+e  |A^||b|  . 

Therefore 

|l6x||  < elllA-^llAlIxl  + |A-ll|b||l+  e||lA-ll|A!|!||6xll  . Q.E.D. 


Note.  Bauer  [1966]  shows  that  the  bound  of  Theorem  2.1  can  be 
improved  by  replacing  (1  - e ||  |a  ^||a|||)  ^ by  ||(I  - e |A  ^||a|)  ^1|,  and  so  it 
is  not  necessary  that  e |]  |a  |a|  1|  < 1 but  only  that  the  spectral  radius  of 
e |a  ^|(a|  be  less  than  1. 

THEOREM  2.2.  Let  Ax  = b.  Then  there  exist  6A  and  fib  such  that 
|6A|  = £ |a|,  I fib  I = e |b|,  and  the  solution  x + fix  o/  (A  + 6A) (x  + fix)  = 
b + fib  satisfies 


|||A-^||A||x|  |A-^||b||| 

(1  + £ |||A-ll|A||l)||x|| 


PROOF.  Let  £.  be  such  that 

(|A-‘|lA||x|  + |A-l||b|),  .|I|A-M|A||x|  + |A-l||b|||  . 

Define  fiA  and  fib  by 

a., 

Jk 

and 


fia.R  = sgn(a^.x^)  e | 


where  A 


-1 


fihj  = sgn(aj^j)  e |bj  | 

(a^j).  Then 

(A  ^fiAx  + A ^5b)„  = Z E a.,  fia.,  x,  -i-  Z a.  . fib. 

j k ^ j ^ 

• £ (|a-1||a||x|  + |A-I||b|), 

- £|||A-h|A||x|  + |A-l||b|||  , 


I but  from  (2.2) 


(A  ^6Ax  + A = (fix  - A ^fiAfix)j^  , 

i and  so 

£||1a-1||a||x|  + iA-i||b|ii<ii6x|i  + t,  |I|a‘M|a|IIII«»II-  o.e.d. 
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In  subsequent  sections  of  this  paper,  we  will  consider  the  effects 
of  perturbing  only  the  elements  of  the  coefficient  matrix. 

THEOREM  2.4.  Let  Ax  = b and  (A  + 6A)  (x  + 6x)  = b where  |6A|  _< 
e I A I . Then 

< e II  |a~^|  |a|  |x|  II  ^ 

* I - (1  - £H|a-M|a|||)  ||x|| 

PROOF.  Similar  to  that  of  Theorem  2.1.  Q.E.D. 

THEOREM  2.5.  Let  Ax  = b.  Then  there  exists  6A  suah  that  |<SA[  = e |a| 
and  suah  that  the  solution  x + 6x  of  (A  + 6A) (x  + 6x)  = b satisfies 

l«xj|  , 1^  IIIa'^IIaIIxIII 

- (lA-e|||A-*||A|||)||x|| 

PROOF.  Similar  to  that  of  Theorem  2.2.  Q.E.D. 

It  follows  from  these  last  two  theorems  that  when  only  A is  subject 
to  uncertainty  the  condition  number  is 

Cond(A,  x)  = II  ^ • 

l|x  I 

Since  II  |A-1|  |A|  |x|  ||  < ||  |a"1|  |a|  |x|  + |a-1|  |b|  ||  < 2||  |a"1|  |a|  |x|  ||. 

Cond(A,  x)  is  also  adequate  for  the  case  where  both  A and  b are  subject  to 
uncertainty.  A similar  quantity 

||a"^I1  T.  llAe^j^  II  |x^  I 

<(A,  X)  = ■ 

is  used  by  Van  der  Sluis  [1970a],  which  he  calls  [1970b]  the  "condition 
number  of  the  solution."  Here  denotes  the  j-th  unit  vector. 

The  condition  number  of  a matrix  A could  be  defined  as  the 

T 

maximum  value  of  Cond(A,  x),  which  is  achieved  with  x = e = (1,  1,...,  1)  . 

Thus 

Cond(A)  = Cond(A,  e)  - ||  [a"^  | |a[  ||. 

This  quantity  is  more  satisfying  as  a measure  of  ill  condition  than  the 
usual  cond(A)  = ||a  ^|||a||  for  a couple  of  reasons.  First,  the  matrix 
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|a~^|1a1  is  a mapping  of  the  solution  space  into 

itself,  which  means  that  the  quantity  || ] A ^||a|1|  can  be  defined  entirely  in 
terms  of  the  solution  space  norm.  Whereas  cond(A)  = ||A  ^||  ||a||  is  defined  in 
terms  of  a solution  space  norm  aiid  a residual  space  norm,  which  seems  quite 
unnecessary.  Second,  the  quantity  Cond(A)  is  invariant  under  row  scaling. 
Multiplying  a system  of  equations  by  a diagonal  matrix  does  not 
change  the  problem  in  any  fundamental  way.  For  example,  all 
systems  Dx  = b where  D is  diagonal  are  well  conditioned.  Accordingly,  we 
h.ive  that  Cond(D)  = 1;  whereas  cond(D)  can  be  arbitrarily  large. 

Example.  According  to  Hamming  [1971,  p.  120],  the  system  Ax  = b 
is  well  conditioned  where 


3 2 1 

3 + 3e 

A = 

2 2e  2e 

, b = 

6e 

_1  2e  -E_ 

_ 2e  _ 

The  inverse  of  the  coefficient  matrix  and  the  solution  are  given  below: 


-.6c 

.4 

.2 

c 

a"1  = ^ 

1-1. 8e 

.4 

.3 

.2c“^- 

.6 

, X = 

1 

.2 

.2c'^-, 

.6 

-.4c-l- 

.6_ 

Hence 


1+1. 8c 

2.4c 

1.6c 

4e~Vl.2 

1.4-.6C 

.8 

-1 

8c  ^ 

1.6 

l-.6c 

and 
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|A“^||A||x|  + |A"M|b|  = 


1-1. 8e 


9.6e  + 3.6e 
4.8  + 2 .4e 
6 - 2.4e 


which  shows  that  the  system  is  well  conditioned.  However, 


Cond(A)  = 


. 8e  ^ + 2.6  - . 6e 


1-1. 8e 

which  indicates  that  the  system  would  be  ill  conditioned  for  some  different 
right  hand  side  b,  and  in  fact,  Hamming  [1971,  p.  122]  gives  such  an  example. 


I 


3.  Stability  of  Algorithms  for  Linear  Systems 

Let  +,  X,  / denote  the  floating  point  operations  corresponding 
to  +,  X,  /,  Every  reference  to  a floating  point  result  x 6 y carries  with 
it  the  assumption  that  x,  o,  and  y are  such  that  the  result  is  well  defined. 
Nothing  is  assumed  about  the  floating  point  arithmetic  except  that  the 
relative  roundoff  error  is  bounded  by  u/(l  + u)  where  the  unit  roundoff 
error  u is  a small  positive  number;  that  is, 

X 6 y = (x  o y)  (1  + 6) 


for  some  6 depending  on  x,  o»  and  y which  satisfies 


1^1  • 


It  follows  from  the  above  condition  that 


X O y = ^ 

1 + 6' 

where  |6'|  £ u.  Note  that  for  rounding  u = -|-  3^  ^ and  for  chopping  u = 3^  ^ 
where  3 is  the  base  and  t is  the  number  of  base  3 digits  in  the  fraction  of 
the  floating  point  numbers. 

For  any  computed  solution  x we  define  the  relative  baekward  error 
to  be  the  smallest  real  number  such  that 

(A  + 6A) (x  - 6x)  = b + 6b 

for  some  6A,  6b,  and  6x  with  |6A|  1 6b  | £n^2j|b|,  and 

|6x|  ^ ^(3)1^  - 6x| . The  backward  error  can  be  interpreted  in  the  following 
way;  The  computed  solution  Sc  is  the  rounded  solution  of  a problem  with 
rounded  data  where  maximum  relative  roundoff  error.  Thus,  if 

is  no  larger  than  the  unit  roundoff  error  u,  then  our  solution  is  as 
good  as  our  data  deserve;  otherwise,  improved  accuracy  may  be  justified, 
perhaps  by  using  iterative  improvement.  Further  motivation  for  this  definition 
is  given  in  Miller  [1975]  and  Bauer  [1974].  If  a solution  cannot  be  computed 


because  the  matrix  is  nearly  singular,  then  the  backward  error  is  defined  to  be 
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the  smallest  real  number  that  A + 6A  is  singular  for  some  |6a| 

with  16a|  1 h^2) 1^1 • 

Stability  of  the  algorithm  means  that  there  exists  a stability 
constant  K(n)  and  a stability  threshold  u(n)  > 0 both  independent  of  the 
problem  data  (A,  b)  such  that  the  relative  backward  error 

1 K(n)u 

provided  that  u £ u(n).  A weaker  concept  asymptotic  stability  allows  the 
threshold  u(n)  to  be  data  dependent.  These  two  types  of  stability  are  the 
same  as  the  "backward  stability"  and  "asymptotic  backward  stability"  used 
by  Miller  [1972] . 


The  backward  error  is  not  easy  to  determine,  and  for  this 

reason  we  introduce  two  variants  of  the  backward  error  which  are  easier  to 
compute.  Let  be  the  smallest  real  number  such  that 


(A  + 6A)x  = b + fib 

for  some  fiA  and  fib  with  | fiA  | £n^2)l^l  I 1 ^(1) 

smal]est  real  number  such  that 

(A  + fiA)x  = b 


for  some  fiA  with  |fiA|  _<  n^^iA(.  Naturally  £ ^(2)  — '^(1)' 

THEOREM  3.1.  Let  I = {i:  (|a||x|  + |b|)^  = 0}.  The  hackuard  error 


r 


(b  - Ax)  , 


jiTi  (lAlIxf  + |b|)^  - "^^>i  = 0 for  ±e  I, 


'(2) 


“ otherwise. 

PROOF.  First  consider  the  case  where  (b  - Ak) . 4 0 for  some  i^  I. 


Suppose  that  h^2)  ^ Then  there  exist  fiA  and  fib  such  that  (A  + fiA)  x 

= b + fib  where  | fiA | ^ h^2)  I I I .1  h(2)  ^ have 


(3.1) 


|b  - A5l  I = 1 6Ax  - 6b  | 

< |6a| IxI  + l6b| 


which  is  impossible  since  (b  - Ax)^  ^ 0 and  (|a||x|  + Ib|)^  = 0.  Therefore 


ti^2)  = + ".  Second  consider  the  case  where  (b  - Ax)^  = 0 for  all  i € I, 


To  obtain  an  upper  bound  on  <^°^®ider  the  choice 


(b  - Ax). 


6a 


ij 


(TaTUI  + |bl^ 


1 4 I. 
1 e I, 


and 


- b. 


(b  - Aii) 


6b.  = 
1 


(1A|1*|  |bl). 


i 4 I, 

i G I. 


We  have  6 Ax  - 6b  = b - Ax  or  (A  + 6A)x  = b + 6b,  and  so 

I (b  - Ax) . I 


TIaIIxI  +TbT); 


Since  Pz-n  < + “,  equation  (3.1)  must  hold; and  so 
K^) 


(b  - A5t) 


^2)  - (lA||x|  + |b|)^ 


Q.E.D 


'(1) 


THEOREM  3.2.  Let  I = U :(|a11x1)^  = 0}.  The  baakward  error 
(b  - Ax) . I 

if  (b  - Ax)^  = 0 for  i € I, 


max 


(A 


otherwise. 


i ^ I 
+ ® 

PROOF.  Similar  to  Theorem  3.1.  Q.E.D. 

Remark.  Similar  types  of  results  apply  to  other  problems;  for 


example,  the  backward  error  for  a polynomial  equation  ag  x"  + a^  x^  ^ +. . .+ 


a^  = 0 is  given  by 


'(2) 


|a_  R*'  + a,  x”  ^ +...+  a | 
' 0 1 n ' 


ho  ^"1  hi 


+. . .+  I < 


I 
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The  following  theorem  gives  bounds  on  the  relative  backward  error 
terms  of  the  more  easily  computed  and  h^2)‘  These  bounds  show 

that  the  n's  are  roughly  the  same  size  when  the  backward  error  is  small, 
and  so  in  the  remainder  of  the  paper  only  the  quantity  Is  used,  which 

we  denote  simply  by  n. 


THEOREM  3.3.  The  three  types  of  baahJard  error  satisfy 


2 + H(2)  - '’(3)  - '’(2)  ’ 


(3.2) 


3 


- ^(3)  - ^(1)  ’ 


(3.3) 


2 + - ’^(2)  - ^1)  • 


(3.4) 


PROOF.  The  second  inequalities  of  (3.2),  (3.3),  and  (3.4)  are 
obvious.  The  first  inequality  of  (3.2)  is  obvious  if  ’1^^)  — Hence 
assume  ^ !•  There  exist  6A,  6b,  6x  such  that 

(A  + 6A) (x  - 6x)  = b + 6b 

where  |6a|  £ n^^^lAj,  |6b|  £ n^j|b|,  |6x|  £ ~ ‘Sx]  . Hence 

|b  - Ax|  = |6A(x  - 6x)  - A6x  - 6b| 


It  is  easily  shown 


£ 2n^3^  |a|  |k  - 6x|  + |b| 


1^  - 


and  so 


b - Ax  < ^ 

' ' — 1 - n, 


(IaMrI  + Ibl). 


(3.5) 


(3.6) 


Therefore  ^(2)  — ^*^(2)^^^  ~ *^(3)^  which  verifies  (3.2).  The  first  inequality 

3 3 

of  (3.3)  is  obvious  if  1^3^  — T'  Hence  assume  n^3^  < y.  From  (3.5) 
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|b  - Ax  I 1 |a1  |x  - 6x1  + '^(3)1*^  ~ (3)  1^1  1*1  » 

and  using  (3.6)  gives 

<l-n(3,)|b-A8|  < (34^  + n(3,)|A||«|. 

Therefore 


\l)  ~ 


"^(3)  ~ ^(3)^ 


(1 

3n 


^3))‘ 


(3) 


(1 


T'(3) 


) 


(1  - 3h(3^)(l 


1 . 
3^(3)'' 


which  proves  (3.3).  The  first  unequality  of  (3.4)  is  obvious  if  n^2)  ~ 
Hence  assume  >^^2)  ^ have 

lb  - Axl  in^2)(|^l  1^11x1)  £h^2)^l^  " ^l^lli^l), 

and  so 


lb  - Axl  1 1aM*|. 

\2) 

Therefore  £ 2n^2)/(b  “ '^(2)^’  '^bich  implies  (3.4).  Q.E.D. 

A good  algorithm  should  (i)  return  an  acceptable  answer  most 
of  the  time  (robustness),  and  (ii)  signal  failure  whenever  it  does  not 
return  an  acceptable  answer  (reliability).  We  formally  define  an  algorithm 
to  be  reliable  If  there  exist  K(n)  and  u(n)  such  that  for  any  (A,  b)  and 
any  u ^ u(n)  either  the  algorithm  computes  an  answer  with  n ^ K(n)u  or  the 
algorithm  signals  failure. 

Any  algorithm  for  solving  linear  systems  can  be  made  reliable  by 
computing  the  backward  error  with  floating  point  arithmetic  and  then  accepting 
the  answer  only  if  the  computed  backward  error  is  less  than  a prescribed  multiple 
of  the  unit  roundoff  error.  For  example,  the  next  theorem  shows  that  if  the 
computed  backward  error  f|  ^Ku,  then  we  can  conclude  that  n (K  + n)ue . 
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The  residual  is  to  be  computed  in  single  precision 

i +a.2  i X2)...  +a^^  x x^) 

or  in  double  precision 

= fl(b^  - (...(a^j^  X x^  + a^2  ^ ^2^”’  ^ ^in  ^ ^n^^‘ 

Here  6 denotes  the  double  precision  counterpart  of  6 where  it  is  assumed  that 

X 6 y = (x  O y)  (1  + 6) 

2 2 

with  |6|^u/(l+u).  In  practice  the  double  precision  unit  roundoff  error 
is  either  this  small  (rounding  in  base  two)  or  smaller.  By  fl(“)  we  mean 
the  conversion  of  a double  precision  value  to  a single  precision  value.  It 
is  assumed  that  fl(xoy)  = (xoy)(l+6)  with  -u/(l+u)  £ 5 £ u,  which  is  true  for 
rounding  and  chopping.  The  computed  backward  error  f|  is  determined  by 
n = max  (|f^|/(...(|a^j^  x | + |a^2  ^ ^2 1 ) . . . + | x 

THEOREM  3.4.  If  ^ is  the  aomputed  value  of  Hj  then 

-(n+2)u  " r.  ^ (n+2)u  » . _ nu 

e n-nOe  <n<e  n+nue 


u for  single  precision  residual  accumulation, 
2 

u for  double  precision  residual  accumulation. 


PROOF.  Let  q be  the  computed  value  of  Ax.  By  the  usual  type  of 


error  analysis 


U - AR|  < [(1  + u)"  - 1]|a| |x| 


We  have  that 


< nu  e’^'^lAl  |r|  . 


(1  - 

1 + u 


(3.7) 
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(1  + u)(l  + j ^ ^ 

n < ^ ^ ^ max  'TTP^i 

“ n - ^ A)  |x) 

^ 1 + u"^ 

where  division  of  two  vectors  is  defined  componentwise.  This  reduces  to 


h < max  Jb 


(1  + 2u)(l  + u) 
from  which  it  follows  that 


A x - 


(1  +-2u) 
(1  + u)’ 


Using 


-(n+2)u  . , b - ^ , (n+2)u 

e n < max  - . r i--f ' < e p. 

— A x — 


|b  - q|  - |q  - Ax)  ^ |b  -Ax|  _<  |b  - q|  + |q  -Ax|  . 


and  (3.7)  gives 

jb  - q)  - nu  e^^  ^^^)a|  jxj  £ jb  - Axj  < jb  - qj  + nu  e^’^  ^^^|a|  jxj  . 


Dividing  by  |a||x|  and  taking  the  maximum  yields 

b - dl  - (n-2)a  , . b - q|  . _ 

max  ® £ h £ max  - -^-i  i ' + nu  e 


(n-2)u 


Q.E.D. 


Before  concluding  this  section,  it  should  be  mentioned  that  for  some  classes 
of  problems  it  may  be  unreasonable  to  expect  an  algorithm  to  be  stable.  If  the  number  of 
output  values  is  fairly  large  compared  to  the  number  of  input  values,  then 
it  becomes  very  difficult  for  an  algorithm  to  be  stable  because  each  output 
value  must  arise  from  the  same  perturbation  of  the  input  values.  For 
example.  Miller  [1975]  shows  that  the  usual  algorithm  for  inverting  triangular 
matrices  is  unstable.  Hence  it  seems  better  to  use  stability  as  a relative 
concept  rather  than  an  absolute  concept.  This  Idea  is  used  by  Miller  [1976] 
in  a ^aper  entitled  "Roundoff  analysis  by  direct  comparison  of  two 
algorithms . " 
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4.  Gaussian  Elimination  with  Row  Pivoting  and  Scaling 

This  section  applies  the  ideas  of  the  preceding  section  to  Gaussian 

elimination  with  partial  pivoting  using  row  interchanges  and  implicit  row 

scaling.  The  reciprocals  of  the  scale  factors  are  to  be  given  as  inputs 

d, , d-,...,  d to  the  algorithm,  and  so  the  pivoting  is  done  as  if  one 
12  n 

were  solving  D ^Ax  = D where  D = diag(d,,  d„,...,  d ).  To  keep  the 

12  n 

notation  simple,  it  is  assumed  that  the  equations  are  numbered  according 
to  their  ordering  after  all  row  interchanges  have  been  performed.  The 
computations  of  the  algorithm  are  as  follows: 

(1) 

a . . = a . . , 

n 11 


for  k = l(l)n  - 1, 


"’ik  = ilk  + 1. 


(4.1) 


(k+1)  (k)  , ^ (k) 

a : . = a : . - m . , X a,  . , 

ij  ij  ik  kj 


i,j  > k + 1,  (4.2) 


0 if  i < j , 

<,  1 if  i = j, 

' m . . if  i > j , 
ij 

. if  il  j, 

\ 0 if  i > j. 


Yl  = b^. 


for  i = 2(l)n,  Vj  = b^^  - (...(i^j^  x ^ ^ *'i,i-l  ^i-1^  ’ 


X = y /u  , 
n n nn 


for  i = n - 


= (y^  ^ ^ Vl  ^,i+2  ^ h+2^--  + “in  " 

It  is  assumed  that  the  selection  of  a pivot  is  done  exactly  so  that 


'‘^1  ^ik 


< la-1 
- '‘^k  \k 


1 > k + 1. 


(4.5) 


In  cases  where  there  are  more  than  one  suitable  pivot,  the  one  with  the  lowest 
row  index  is  chosen.  The  assumption  of  exact  pivot  choice  avoids  some  minor 
technical  difficulties,  and  it  also  makes  for  a sharp  error 


bound  in  the  case  where  there  is  no  scaling. 
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It  is  important  to  appreciate  the  nature  of  the  functional 

relationship  between  ^ and  D.  The  computed  solution  x is  a function 

^(P)  of  the  row  permutation  P,  which  in  turn  is  a function  11(D)  of  the 

scaling  matrix  D.  (Note  that  11(D)  is  also  defined  for  values  of  D which 

are  not  floating  point  numbers  because  the  algorithm  does  not  perform 

floating  point  arithmetic  on  D.)  If  5c  is  viewed  as  a function  defined 

in  (dj^,  d2,*..,  d^)-space,  it  would  be  constant  over  regions  bounded  by 

(k) 

hyperplanes  passing  through  the  origin.  For  example,  let  a^^  be  the  values 
corresponding  to  a certain  choice  d^  = d^  of  scale  factors.  Then  x is 
constant  for  all  values  of  (d^^,  d^,...,  d^)  which  satisfy 


> 


ik 

(k) 

kk 


1 > k. 


It  is  necessary  for  the  completeness  of  the  theory  that  the  Gaussian 

elimination  algorithm  be  extended  to  permit  the  use  of  zero  as  a scaling 

factor.  For  any  diagonal  matrix  D = dlag(d^,  d2,...,  d^)  we  let  D^  denote 

the  matrix  dlag({d,}  , {d„}  ,...,  {d  } ) where 
1 e 2 E n e 


{d}  = 

e 


d if  d 0 
.£  if  d = 0. 


That  is,  D^  is  the  matrix  D with  all  the  zero  diagonal  elements  replaced  by 
e.  The  condition  (4.5)  is  replaced  by  the  following: 


11m 
e ^ 0 


id . i a., 
i e ik 


^‘^k^  \k 


< 1. 


— I Ik)  Ck) I 

Let  e = min{|d^  ^kk  '^Ik  5^  0}*  Then  it  can  be  easily  shown 

that  for  any  e with  0 < |e|  < e the  scale  matrix  D^  has  the  same  effect  on 
the  choice  of  pivots  as  does  D. 

Unfortunately,  Gaussian  elimination  with  pivoting  is  unstable. 
This  instability  arises  in  the  decomposition  stage  when  the  quantities 
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(k+1)  (k)  . (k) 

a..  =a..  -m.,  xa,  . 

ij  ij  ik  kj 

are  being  formed.  If  |m.,  1 >>  lafV  I , then  the  error  in  is  not 

ik  kj  ' ' ij  ' ij 

(k) 

very  small  relative  to  a..  , and  so  in  our  backward  error  analysis  we  cannot 

(k) 

throw  this  error  back  into  a...  . The  extreme  case  occurs 

ij 

when  af^^  = 0 and  4 0,  which  is  commonly  called  "fill  in."  For 

iJ  ij 

sparse  systems  of  equations  it  is  quite  common  to  order  the  rows  so  as  to 


avoid  fill  in.  This  reduces  computational  cost,  and  it  apparently  may 
also  contribute  to  stability. 

The  instability  of  Gaussian  elimination  has  been  pointed  out  by 


Hamming,  who  on  page  119  of  his  book  [1971]  announces  the 

"Theorem  Pivoting  can  take  a well-conditioned  system  into  an 
ill-conditioned  system  of  simultaneous  linear  equations." 

and  on  page  123  states 


"We  have  not  justified  the  pivoting  method;  rather  we  have 
shown  that  it  is  an  'old  wives'  tale.'  But  like  most  old 
wives'  tales,  it  is  a mixture  of  truth  and  mystic  faith." 


To  prove  his  theorem,  Hamming  uses  the  example  discussed  at  the  end  of  §2. 
For  this  example,  one  elimination  step  with  partial  or  complete  pivoting 
yields  the  system  A'x  = b'  where 


I " ‘ " 3 + 3e 

A'  = j 0 ^ + 2e  ^ + 2e  , b'  = -2  + 4e 


0 -2  ^ - -1 

-3-  + 2e  - - e 


•1  + E 


(4.6) 


assuming  exact  arithmetic.  This  problem  is  ill  conditioned  for  small  e. 


Cond(A',x)  = 


.8e  ^ - 3 + 3e 
1 - 1.8e 
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If  the  elimination  were  performed  in  floating  point  arithmetic,  then  a 
slight  perturbation  of  (4.6)  could  result,  which  may  have  a solution  which 
differs  from  the  true  solution  by  an  amount  proportional  to  e This  kind 
of  error  could  not  arise  from  slightly  perturbing  the  original  problem 
because  it  has  a condition  number  of  about  6.  For  example,  suppose  that 
the  computed  rlghthand  side  of  (4.6)  was 


b' 


3 + 3e 

-2  + 4e  - 7 
4 

-1  + E + I 


J 

and  everything  else  was  exact.  Then 


1 - E ^ u/8 
1 + E ^ u/4 


and  by  Theorem  3.2  the  backward  error  is  u/(8e  + u) . 

A related  observation  was  made  by  Gear  [1975]: 


"It  might  be  possible  to  say  that  6A  represents  a perturbation  to 
the  original  physical  problem  if  the  sparsity  structure  of  6A 
were  the  same  as  that  of  A.  Unfortunately,  we  will  show  that  such 
a demand  on  the  structure  of  6A  can  lead  to  very  large  bounds  on 
||6A|[,  bounds  probably  dependent  on  the  condition  number  of  A." 


This  was  supported  by  the  example 


“1 

1 

-1 

-1~ 

1 

0 

E 

0 

0 

1 

-1 

E 

A = 

0 

0 

E 

0 

. b = 

1 

, X = 

-1 

E 

_1 

0 

0 

1_ 

_2_ 

_ 1 _ 

for  which  Cond(A)  = 4. 
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In  one  of  the  theorems  that  follow,  it  is  shown  that  the  situation 

is  not  quite  as  bad  as  Gear  suggests.  The  bounds  on  ||6a||  depend  not  on 

how  ill  conditioned  the  problem  is  but  on  how  badly  scaled  the  equations  are. 

VJe  begin  by  obtaining  a totally  a pinorn-  error  bound  for  Gaussian  elimination. 

The  proof  is  modeled  after  that  of  Forsythe  and  Moler  [1967],  which  is  mostly 

borrowed  from  Wilkinson  [1963].  However,  our  error  bound,  like  that  of 

Van  der  Sluis  [1970b],  is  more  informative  than  that  of  Forsythe  and  Moler 

in  that  it  distinguishes  among  the  columns  of  AA. 

THEOREM  4.1.  Let  the  veatov  x be  computed  by  Gaussian  elimination 

with  vow  pivoting  and  row  scaling  where  D = diag(d, , d_,...,  A ) is  the 

12  n 

matrix  of  reciprocal  scale  factors.  Then  there  exists  LA  such  that 
(A  + AA)x  = b 

with 

II  AA|z||  < x(n)u|l  a|z|| 

for  arbitrary  z ^ 0 and  arbitrary  c satisfying  0 < Ie|  < e where 
X(n)  = fl9-2"~^  - n - 81e^"^. 

PROOF.  See  Appendix  A.  Q.E.D. 

Note  1.  The  factor  appears  in  the  Forsythe  and  Moler  book 

as  the  constant  1.01.  The  advantage  of  is  that  it  Indicates  the 

nature  of  the  higher  order  effects  and  it  does  not  require  placing 
some  arbitrary  restriction  on  the  size  of  nu. 

Note  2.  It  is  actually  possible  to  show  that  |aa|  ^ ^gl^l 
some  lower  triangular  matrix  Lg,  although  the  best  possible  Lg  is 
somewhat  complicated. 

With  z = e in  Theorem  4.1  we  get  the  usual  type  of  bound 
||d‘^  AA||  < x(n)u  ||D^^  All  . 


& 


The  bound  of  this  theorem  is  practically  always  extremely  pessimistic. 
However,  there  are  cases  where  this  bound  can  be  attained  in  the  limit 
as  u -*■  0. 

THEOREM  4.2.  There  exists  a problem  Ax  = b and  a floating  point 
arithmetic  <+,  x,  />  such  that  the  solution  x computed  by  Gaussian 
elimination  with  partial  or  complete  pivoting  satisfies  (A  + AA)x  = b 
only  for  those  matrices  AA  for  which 

- n - 81u  + O(u^). 

Therefore , the  bound  of  Theorem  4.1  is  the  best  possible  bound  in  the 
limit  u 0. 

PROOF.  See  Appendix  A for  the  proof,  which  employs  a modification 
of  Wilkinson's  [1963]  example.  Q.E.D. 

The  next  theorem  uses  Theorem  4.1  to  get  a bound  on  the  backward 
error.  We  are  especially  Interested  in  the  effect  of  scaling  on  the  error 
bound . 

THEOREM  4.3.  Let  I = {i:  ( [ A [ | x [ ) ^ = 0} . If  = 0 for  1 € I 

and  d ^ ^ 0 for  i ^ I , then  the  backward  error 


n £ x(n)u 


‘ - 111 
nln  ( I D ^A I I X I ) , 


PROOF.  Putting  z = |x|  in  Theorem  4.1  gives 
\\D~hh  - AR)||=|1d”^  AAR|1<||1d“^  AA||k||| 

1 X(n)u  II  |d^  A|  |k|  ||. 

Multiplying  this  by  e and  letting  e ->■  0 gives 

max  I (b  - AR) , | _<  x(n)u  max  ( |a|  |R|  ) . , 
le  I ^ i€  I 1 

from  which  it  follows  that  (b  - AR)^  = 0 for  1^  I.  Hence,  from  Theorem  3.2 


we  have  that 
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n 


max 

i4  I 


I (b  - Ax) ^ I 
(|A| \k\)^ 


(lD~^(b  - A*) |)^ 
max  = 

i 4 I (1d~^a| |x|)^ 


and  from  (4.7)  we  see  that 

( |d  ^(b  - A*)  |)^  _<  x(n)u  max  |d-1a||*|. 


Q.E.D. 


By  choosing 

= (|a| |x|).  (4.8) 

the  bound  on  the  backward  error  is  minimized,  giving  n £ x(n)u.  This 
suggests  that  a linear  system  should  be  scaled  by  dividing  each  row  by 
its  weighted  norm  where  the  weights  are  the  components  of  the  computed 
solution.  Unfortunately,  (4.8)  represents  an  implicit  equation  for  the 
scale  factors  d^  because  the  computed  solution  k is  a function  C 01(D))  of 
the  scaling  matrix  D;  that  is,  D must  solve  the  equation 


D = diag(|A| |c(n(D))I), 


(4.9) 


for  which  a solution  may  not  exist.  The  nature  of  this  equation  becomes 
more  apparent  by  noting  that  it  is  equivalent  to  solving  for  a 
permutation  P that  satisfies 

P = n(diag(|A| |C(P)|)).  (4.10) 


For  if  D satisfies  (4.9),  then  P = 11(D)  satisfies  (4.10);  and  if  P 
satisfies  (4.10),  then  D = diag ( | A | | F (P) | ) satisfies  (4.9).  In  principle 
we  could  determine  the  solution  to  (4.9),  if  there  is  one,  by  testing  to  see 
if  any  of  the  n!  permutations  P satisfy  (4.10).  In  the  cases  where  a solution 
exists,  the  backward  error  is  bounded  by  x(>^)u*  One  suspects  that  (4.10) 
almost  always  has  a solution,  and  it  is  even  conceivable  that  (4.10)  always 
has  a solution,  at  least  whenever  C(P)  is  defined  for  all  P.  The  existence 
of  a solution  for  (4.10)  implies  the  existence  of  an  ordering  for  the  rows 
which  makes  Gaussian  elimination  stable. 
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If  one  wishes  to  solve  (4.10),  the  following  iteration  would 
likely  converge  and  converge  quickly  for  almost  every  system  of  equations: 

P(0)  = n(diag( |A|e)) , 

^(nri-1)  " n(diag(|Al |S(P(^))|)),  m = 1.  2,.... 

This  is  not  suggested  as  a practical  algorithm  though  because  poorly 

scaled  equations  can  usually  be  accurately  solved  by  doing  iterative  improvement. 

A more  useful  application  of  Theorem  4.3  is  the  diagnosis  of 
ill-scaling,  for 


Oj^(A,  x)  = 


max 

A 

X 

min 

A 

X 

is  an  easily  computable  measure  of  how  badly  scaled  the  rows  are. 
Remark.  The  bound  of  Theorem  4.1  can  be  refined: 


(Id  ^aa|z) 


± (|D  a|z).. 

^ i < 1 ^ 


This  suggests  that 

max  ( |a| |k| ) . 

iV)  wr*r)t 

would  be  a better  measure  of  the  possible  effect  of  ill  scaling. 

The  quantity  o„(A,  x)  is  not  very  satisfactory  for  theoretical 

purposes  because  k depends  on  the  arithmetic  used  in  the  computation.  We 

would  prefer  to  use  o_(A,  x)  for  the  theory.  For  Hamming's  example 

R 

IaI IxI  = (3  + 6e,  4e)^  and  o_(A,x)  = + 1 . Near  optimal  row 

' » ' » K 4c  4 

scaling  for  this  problem  is  given  by 


D~^A 


3 2 1 


- 2 2 
e 


-2-1 

f 


, D~^b  - 


3 + 3e 
6 
2 


For  Gear's  example  |a|1x|  * (2/e  + 2,  1,  1,  2)  and  Oj^(A,x)  » 2/c  + 2.  Near 
optimal  row  scaling  is  given  by 


y 


L I 
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[ 


d“^a  = 


e 

e 

e 

e 

0 

e 

0 

0 

— 1 

1 

, D b = 

0 

0 

e 

0 

1 

_1 

0 

0 

1_^ 

_2_ 

> i 


One  may  wonder  about  the  effect  of  scaling  strategies  such  as 
row  equilibration.  Van  der  Sluis  [1970b,  p.  80]  gives  an  example  showing 
"that  it  is  quite  possible. .. that  there  exists  no  bound  depending  on  n only 
for  the  ratios  of  the  errors  after  and  before  equilibration."  He  goes  on 
to  describe  a cautious  equilibration  scheme  that  never  worsens  the  situation 
at  the  expense  of  possibly  not  improving  it.  An  adaptation  of  this  scheme 
to  our  theory  is  to  choose 
a . . 

d.  = min  max  , 

k 3 kj 

which  has  the  effect  of  leaving  no  row  of  A strictly  dominated  by  any  other 
row  of  A.  Note  that,  since  d^  £ 1,  we  have  min|D  ^a| |x|  £ a| |x|.  Furthermore, 


(|a1|x|)  = min  ^l~l  l^kj,  I 


< min  max  \-^\  Z [x^| 

k j kj  £ 


£ d^  max|A| |x| , 


whence  max|D  ^a||x|  £max|A||x|.  Therefore 


,„-l.  ^ max  D Ax  , 

Oj^(D  A,  x)  = £37'  1 


max 

A 

A- 

min 

A 

X 

Oj^(A,  x). 


min|D  A| |x| 

so  that  the  scaling  of  the  problem  is  not  made  worse  by  our  choice  of  D. 


1" 


i 
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The  theorem  that  follows  gives  bounds  on  the  backward  error  that 

in  the  limit  u ->•  0 depend  only  on  how  ill  scaled  the  problem  is  and  not 

how  ill  conditioned  it  is.  First  we  need  a lemma. 

LEMMA  4.4.  If  T)  is  nonsingular,  then 


|d-1mx||  < 


I !. 


1 - x(n)u  Cond(A  D) 


PROOF . We  have 
R + A ^AAR  = X, 


which  implies 


D ^AAR  = (I  + d“^AA  ^AAx 


1|d~^AAx||  £ 


1 - |Id“^aa  a ^ d1 


The  term  in  the  numerator 

||d~^aax||  < II  |d"^aa|  IxI 


< x(n)u  II  |d  a|  |x|  II, 


and  the  term  in  the  denominator 


||d~^AA  a ^d|1  = II  |d  ■'•AA  A~-‘'Dle 


1 A A A ” It 


< II  Id  ^AA|  |A"^D|e|l 
£ x(n)u  II  |d  ^a]  |a  ^D|e| 
= x(n)u  II  |d  ^a|  |A  ^d|  II 


Q.E.D. 


THEOREM  4.5.  Let  D be  nonsingular  and  let  |a| lx|  > 0. 

Gaussian  elimination  with  row  pivoting  gives 
X(n)u  0 (d“^A,  x) 

r,  < 

1 - 2x(n)u  Cond(A  ^D)  o„(D  A,  x) 

K 

provided  that  the  denominator  is  positive. 

PROOF.  Choose  AA  as  in  Theorem  4.1.  From  Theorem  3.2  we  have  that 


n = max 
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Since  x = x + A ^ Ax,  we  have 

|x|  < lx  I + I A ^DlellD  ^AAfcll  , 


and  so 


Id  ^A||x|  ^ |d  ^a|1x]  + Id  ^a]  lA~^D|e||D  ^AAx||  . 


Thus 


n < max 


e||P  AA5^[ 

1d"^a| 1*1 


< max 


e||D  ^AAx| 


Id  ^a1|x|  - e Cond(A  ^D)l| d'^^AA* | 
Applying  Lemma  4.4  yields 


X(n)u  0 (D  A,  x) 
e||D  ^AA*||  < 


|d  ^a||x|  , 


1 - x(i')u  Cond(A  D) 
from  which  the  theorem  follows.  Q.E.D. 

Although  we  are  unable  to  prove  that  there  is  always  some 
ordering  of  the  rows  for  which  Gaussian  elimination  is  stable,  we  can  show 
that  this  is  true  asymptotically  as  u ->•  0. 

THEOREM  4.6.  For  any  problem  suah  that  |a|1x|  > 0 there  is  sane 
ordering  of  the  rows  for  which  Gaussian  elimination  is  asymptotically  stable. 

PROOF.  Using  Theorem  4.5  with  D = diag(|A||xl)  gives 
x(n)u 


n < 


1 - 2x(n)u  max 


.-1| 


|A|  |x 

for  small  enough  u.  Hence  for 

Ia| |x| 


A x 


u £ u(n)  = min 

we  have 

n £ 2x(n)u. 


4x(n) |a| |A  ^ I |a| [x 


Q.E  .D. 
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We  end  this  section  by  examining  the  effect  of  scaling  on  a good 

bound  for  the  "forward"  error. 

THEOREM  4.7.  The  error 

||X  _ ^11  < X(n)u  ||A~M|J1.|d~M|x1|1  , 

1 - x(n)u  Cond(A  ^D) 

PROOF.  We  have 

Hr  - x|l  = aax) II 

< I1a"^d11  ||d“W||  , 

and  the  theorem  follows  from  Lemma  4.4.  Q.E.D. 

If  higher  order  terms  in  u are  ignored,  the  bound  on  the 
error  is  minimized  by  choosing  D = diag(  [a  | |x  |) . Thus 

llA-^ll  IIIaIIxIII 
lllA-'llAlMII 

is  a measure  of  the  possible  effect  on  the  "forward"  error  of  how  poorly  the 
equations  happen  to  be  scaled.  For  Hamming's  example  this  quantity  is 

e ^ + 0(1)  and  for  Gear's  it  is  e ^ + 0(1). 

The  usual  type  of  bound  on  the  error  is  of  the  form 
||R  - xjl  1 X (n)u|lA"^D|l  |1d”^a|1  |1x1|  + O(u^)  . 

This  is  minimized  by  D = diag(|A|e),  which  is  row  equilibration  with 
the  norm. 


i 

j 
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5.  Gaussian  Elimination  with  Column  Pivoting  and  Scaling 
This  section  is  similar  to  the  previous  section  except  that  we 
examine  the  variant  of  Gaussian  elimination  in  which  the  columns  are 
interchanged  in  order  to  ensure  that  the  pivot  element  is  the  largest  in 
its  row.  The  algorithm  is  assumed  to  do  column  scaling  where  the  scale 
factors  dj^,  d^,...,  d^  are  given  as  inputs  to  the  algorithm.  Again  the 
selection  of  the  pivot  is  assumed  to  be  done  exactly  so  that 


\i  ‘^il 


< |dj,  i > k + 1 . 


Writing  the  condition  in  this  form  allows  for  the  use  of  zero  scale  factors 
but  does  not  permit  the  selection  of  a zero  pivot. 

An  a priori  error  bound  is  given  by  the  following  theorem: 

THEOREM  5.1.  Let  the  vector  R be  computed  by  Gaussian  elimination 
with  column  pivoting  and  column  scaling  where  D = diag(dj^,  d2,...,  d^)  is 
the  matrix  of  scale  factors.  Then  there  exists  AA  such  that 
(A  + AA)R  = b 

with  |AAD|e  < x(n)u|AD|e 


where  X(n)  = 127*2'’  ^ - 5n  - 

PROOF.  See  Appendix  B.  Q.E.D. 

It  is  likely  that  the  constant  x(n)  in  this  bound  could  be 
replaced  by  a smaller  constant. 

The  following  theorem  indicates  how  the  columns  should  be  scaled 
in  order  that  Gaussian  elimination  with  column  pivoting  be  stable. 

THEOREM  5.2.  Let  H be  the  value  of  A ^b  computed  by  Gaussian 
elimination  with  column  pivoting  and  column  scaling  where  D = diag(dj^,  d2»...,  d^) 
is  the  matrix  of  scale  factors.  Let  I - {1:  (|a||r|)^  = 0}  and  let 
J = {j:  Rj  » 0}.  7/  dj  • 0 for  j c J and  dj  # 0 for  j 4 J.  then  the  backward  error 
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(|AD|e)i 

" - ^(n)u  '1°  ^l^j- 

PROOF.  We  have  R = where  denotes  D with  all  diagonal 

zero  entries  replaced  by  ones.  Hence 

lb  - AR|  = I AAR  I < |AAD|e  | |d^^R| | 

_<  x(n)u  lAD|e  | lo^^Rj  | 

= x(n)u  |AD|e  max  (|d  ^x|).. 

We  have  that  (1a|1r|)^  = 0 implies  that  a^j  = 0 for  j 4 
that  (|AD|e)^  = 0.  Hence  the  theorem  follows  from  Theorem 
COROLLARY.  The  baahjard  error 
(Id'^xI) 

h £ x(n)u  max  ^ . 

(Id  ^xl)^ 

PROOF.  We  have 

e < |d  ^x|  max  7 . Q.E.D. 

j^J  <|D-h|)j 


which  implies 

3.2.  Q.E.D. 


A choice  of  D which  minimizes  the  bound  on  the  backward  error  is 

d . = X . ; 

1 1 

that  is,  we  scale  by  multiplying  the  i-th  column  by  the  i-th  component  of 
the  computed  solution.  Again  these  weights  are  not  known  at  the  time  when 
scaling  is  performed.  The  main  value  of  this  theorem  is  that  it  gives  an 
easily  computable  measure  of  column  ill  scaling: 

/■ . - , I a1  e II  R|| 

o^(A,  X)  = max  • 

For  theoretical  purposes  we  would  prefer  to  use  <^q(A,  x) . For  Hamming's 

1 2 

example  o (A,  x)  = + — . Near  optimal  row  and  column  scaling,  which 

j C J 

would  be  appropriate  for  complete  pivoting,  is  given  by 


1. 


\ 
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THEOREM  5.4.  Let  D be  nonsingular  and  let  |a| lx|  > 0.  Gaussian 


elimination  with  aolujrm  pivoting  gives 


-1 


n < 


x(n)u  o^(AD,  D x) 


.-1 


1 - 2x(n)u  Cond(AD)  a^^CAD,  D x) 


provided  that  the  denominator  is  positive. 

PROOF.  Choose  AA  as  in  Theorem  5.1.  From  Theorem  3.2  we  have  that 

■ aarI 


r)  = max 


A X 


Furthermore 


X = X + A ^AAx, 


and  so 


A x > A x 


IaI Ia  IaaxI . 


Therefore 


n < max 


AAx 


|A| |xl  - 

Applying  Lemma  5.3  yields 


1 A I 1 A ^11 AAx I 


AAR  < 


X(n)u  o^(AD,  D x) 


A xL 


— 1 - x(n)u  Cond(AD) 

from  which  the  theorem  follows.  Q.E.D. 

THEOREM  5.5.  For  any  problem  such  that  |x|  >0  there  is  some 
ordering  of  the  eolwnns  for  which  Gaussian  elimination  is  asymptotically  stable. 
PROOF.  Using  Theorem  5.4  with  D = diag(|x|)  gives 
X(n)u  


n < 


-1 


1 - 2x(n)u  max 


A X 


for  small  enough  u.  Hence  for 
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u < u(n)  = min 


4x(n) |A  |a| |x| 


we  have 


n _<  2x(n)u. 

Recall  that  the  stability  threshold  in  the  case  of  Gaussian 
elimination  with  optimal  row  ordering  was 


u(n)  = min  ' — 

4x(n)lA||A  ||a|1x1 

It  is  easy  to  show  that  this  is  larger  than  the  stability  threshold  for 
optimal  column  ordering.  This  is  a slight  indication  that  row  nivotine 
may  be  superior  to  column  pivoting. 


THEOREM  5.6.  The  ewov 


U - x|l  < l<^.)iLllL^^l|AD|||  ||D'^x| 
I II  — 1 _ ^(n)u  Cond(AD) 


PROOF.  Since 


(A  + aA) (x  - x)  = - AAx, 


X - X = -(I  + A~^AA)“^  a“^AAx 

= - A~^AA(I  + a“^AA)“^x 
= - A~^AAD(I  + d“^A~^AAD)“^  d“^x. 


X - X < 


1 - ||d“^a“^aad| 


|a~^|  [AAP|e  Hd~^xII 
1 - II  |d“^a"^|  lAAolel 


1 ~ x(n)u  Cond(AD) 
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6 , I tera ti ve  Improvemen t 

It  is  often  thought  that  iterative  improvement  is  not  worthwhile 
unless  either  (i)  the  uncertainty  in  the  values  of  A is  less  than  the  unit 
roundoff  error  (e.g.,  if  the  elements  of  A are  integers)  or  (ii)  we  wish  to 
diagnose  ill  conditioning.  This  thinking  is  based  on  the  fact  that  Gaussian 
elimination  with  pivoting  is  stable  from  the  absolute  error  point  of  view. 

But  according  to  the  relative  error  point  of  view,  Gaussian  elimination  may 
not  give  acceptable  accuracy,  and  so  it  is  of  interest  to  examine  the 
stability  behavior  of  iterative  Improvement.  Results  of  a careful  error 
analysis  are  given  for  iterative  improvement  both  with  and  without  double 
precision  accumulation  of  the  residuals. 

The  algorithm  being  considered  is  described  as  follows  where 
subscripts  denote  iterates  rather  than  components  of  vectors: 

= value  of  A ^b  computed  by  row  pivoting, 
for  m = 1,  2,  3, . . . 

r = computed  value  of  b - Ax  , 

d^  - value  of  A ^r^  computed  by  row  pivoting, 

X . , = X + d . 

nrrl  m m 

The  algorithm  for  computing  r^  appears  in  §3  and  the  algorithm  for  d^  is  in  §4. 

The  theorem  which  follows  shows  that  just  one  iteration  of  iterative 
Improvement  with  just  single  precision  accumulation  of  the  residuals  is 
enough  to  make  Gaussian  elimination  asymptotically  stable.  This  may  seem  to 
contradict  the  usual  advice  (Forsythe  and  Moler  [1967,  p.  49])  that  "It  is 
absolutely  essential  that  the  residual  rj^  be  computed  with  a higher  precision 
than  that  of  the  rest  of  the  computation."  Actually  there  is  little  conflict 
because  we  have  shown  that  poorly  scaled  systems  may  be  solved  with  an 
effective  precision  of  much  less  than  single  precision. 
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THEOREM  6.1.  Assume  that  |a| |xl  > 0.  Gaussian  elimination 
followed  by  one  iteration  of  iterative  improvement  results  in  a backward 
error  which  satisfies 

^2  1 (n+l)u  + { (x(n)^  + 2nx(n)  + + 2n)  Cond(A  Oj^(A,  x) 

+ x(n)  x)  + Y + O(u^) 

for  single  precision  residual  accumulation  and 

n'2  £ u + tx(n)^  Cond(A  o^^(A,  x)  + x(n)  x)  + n}u^  + O(u^) 

for  double  precision  residual  accumulation.  That  is,  the  algorithm  is 
asymptotically  stable  in  either  case. 

PROOF.  The  theorem  follows  from  Theorem  C.9  in  Appendix  C.  Q.E.D. 
Note.  The  asymptotic  nature  of  these  bounds  conceals  the  fact  that 
certain  assumptions  on  the  smallness  of  u are  necessary  in  order  to  get  any 
bound  at  all.  The  actual  assumptions,  found  in  Appendix  C,  are  too  lengthy 
to  reproduce  here;  roughly  speaking  it  must  be  assumed  that  the  coefficient 
of  the  second  order  term  is  less  than  1/u. 

Recall  from  Theorem  A. 3 that  for  no  iterations  we  have 
1 X(n)  x)u  + O(u^), 

and  thus  one  iteration  does  make  a big  difference  in  the  size  of  the  bound. 
However,  the  presence  of  the  product  Cond(A  cr  (A,  x)  in  the  second  order 
term  indicates  that  this  may  not  be  the  case  for  problems  which  are 
sufficiently  ill  scaled  and  ill  conditioned. 

THEOREM  6.2.  Assume  |aMx|  > 0.  The  backward  error  n for 

I I I I jji  • 

iterative  improvement  of  Gaussian  elimination  with  row  pivoting  satisfies 

lim  n'  < (n+l)u  + {2n(x(n)  + n+1)  Cond(A  o„(A,  x) 
m K 

+ x(n)  Oj^(A,  x)  + -|  n^  + -|  n + l}u^  + O(u^) 

for  single  precision  residual  accumulation  and 

lim  n"  < u + (x(n)  o„(A,  x)  + n+l}u^  + O(u^) 
m — K 

m-K» 

for  double  precision  residual  accumulation. 
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PROOF.  The  theorem  follows  from  Theorem  C.7  in  Appendix  C.  Q.E.D. 

The  main  effect  of  doing  more  Iterations  with  single  precision 

accumulations  is  a moderate  reduction  in  the  magnitude  of  the  second  term. 

But  for  the  double  precision  case  there  is  a striking  Improvement  due  to 

2 2 

the  disappearance  of  the  x(n)  Cond(A~  ) tT|^(A,  x)u  term  so  that  the  bound 

-1  3 

on  n"  depends  on  the  condition  number  of  A only  through  the  0(u  ) term, 
m 

This  may  represent  a significant  improvement  for  problems  which  are  both 


poorly  scaled  and  ill  conditioned. 
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7.  Practical  Implications 

The  comments  that  follow  are  suggested  by  the  error  analysis, 
but  their  usefulness  remains  to  be  established. 

By  means  of  examples  it  has  been  shown  that  Gaussian  elimination 
with  (partial  or  complete)  pivoting  does  not  generally  provide  all  the 
accuracy  that  the  data  deserve  or  even  a fixed  fraction  of  that  accuracy. 
Hamming  [1971,  p.  121]  states 

"It  is  reasonable  to  ask  how  typical  these  examples  are  and 
how  often  in  the  past  the  pivoting  method  has  created  the 
ill  conditioning  that  was  reported  to  occur  by  some  library 
routines.  The  answers  are  not  known  at  this  time;  all  that 
is  claimed  is  that  textbooks  and  library  descriptions  rarely, 
if  ever,  mention  this  possibility  (though  it  is  apparently 
known  in  the  folklore)." 


and  so  it  seems  that  there  have  been  practical  instances  where  the  pivoting 
method  has  performed  poorly.  Perhaps  Gaussian  elimination  without 
iterative  improvement  should  be  regarded  as  a "quick  and  dirty"  way  to 
solve  linear  equations. 

The  computation  of  the  backward  error  is  one  reliable  test  for 
deciding  whether  or  not  the  solution  of  a linear  system  is  "reasonably 
accurate."  The  test  can  be  made  quite  efficient  by  accumulating  r and 
|a| |k|  + |b|  at  the  same  time.  If  the  test  is  failed,  then  in  most  cases 
the  use  of  iterative  improvement  would  result  in  a solution  which  passes 
the  test.  One  could,  of  course,  forgo  the  backward  error  computaticn  and 
just  do  iterative  improvement  until  "convergence.”  But  such  a procedure 
may  not  be  completely  reliable  since  it  has  not  been  rigorously  proven  that 
"convergence"  implies  a reasonably  accurate  solution.  Stewart  [1973, 
p.  205]  mentions  "the  possibility  that,  with  a violently  ill-conditioned 
matrix,  the  iteration  may  appear  to  converge  to  a false  solution." 


i 
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It  Is  also  suggested  by  the  theory  that  if  double  precision 
accumulation  of  the  residuals  is  costly,  then  Iterative  improvement  with 
single  precision  accumulation  might  still  be  beneficial. 

The  success  of  the  pivoting  method  depends  upon  a reasonable 
scaling  of  the  equations,  which  is  at  best  guesswork  unless  one  has  some 
knowledge  about  the  sizes  of  the  solution  components.  If  c ~ |x|,  then 

(1)  for  row  pivoting  one  should  scale  the  system  to  get 
(Dj^^A)x  = (D^^b)  where  = diag(|A|c). 

(ii)  for  complete  pivoting  one  should  scale  the  system  to  get 

(Dj^AD2)  (D^^x)  = (Dj^^b)  where  = diag(|A|c)  and  D2  = diag(c) 
It  may  be  worthwhile  to  allow  users  of  a linear  equation  solver  to  provide 
an  estimate  of  the  solution,  particularly  if  the  solution  variable  X is  being 
used  only  as  an  output  parameter.  For  simple  use  of  the  program  X could 
be  set  to  all  ones. 
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Appendix  A.  Error  Bounds  for  Rcw  Pivoting 


For  Lemmas  A, 1 through  A. 8 it  is  assumed  that  D has  nonzero  diagonal 

elements.  This  assumption  does  not  apply  to  the  theorem  that  follows 

these  lemmas.  For  any  n x n matrix  C = (c^.)  let  c..  = c.  ,/d..  Also, 

ij  ij  il  1 

let  0)  = 1 + u. 

LEMMA  A.l.  We  have 
Uikl  1 , i > k. 


|-(k)|  k+1  .k-l-l,- 

|a  I < 0)  E (2w)  la 
iJ  11=1 


. k-l|- 

+ w |a,. 


i.j  1 k. 


PROOF.  Equation  (4.1)  implies 


(k),  (k) 


ik  ' “kk  ' ’ 


i > k. 


and  because  of  row  pivoting  (see  (4.5))  we  get  the  first  inequality  of 
the  lemma.  Equation  (4.2)  implies 

^,.L(k)|  . /I  . „(^)|  ,•  1 ^ 1,  j.  1 


|a(k+l) 

' iJ 


-“^ij  ^ + (1  + ^“^l"'ik  I’  i.j  i k + 1 


and  therefore 


1 + “(1  + 2u)|i|JJ^|,  i.j  1 k + 1. 

The  second  inequality  of  the  lemma  follows  from  this  by  induction  on  k.  Q.E.D. 


LEMMA  A.  2.  The  matrices  L = U = satisfy 

LU.  A + E<»  +E<« 

(k)  (k) 

where  the  matrices  E have  elements  which  satisfy 


+ (2  + 3u)  u)  for  i,j  > k , 


I (k) , ; -1  I (k) 

|cij  I < ( 0,  u|a^^ 


for  i > j = k, 
otherwise. 


regardless  of  the  pivoting  strategy. 
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(k) 

PROOF,  Define  the  elements  of  E by 

,(k) 


r (k+1)  ,(k) 

ij  ij  ik  kj 


= < 
Ij  ' 


ik  kk  ik 


Lo 

By  separately  considering  the  cases  i ^ j and  i > j it 

(k)  (k) 

to  show  that  the  elements  of  E satisfy 

(k)  " 

which  establishes  the  equality  of  the  lemma.  Let  k i 
Write  (4.1)  as 

/ (^)/— (^)\/l  1 £ \ 

^k  = ^^k  /\k  ^k^’ 


and  (4.2)  as 

(k+1) 


where  the  6’s  are  relative  roundoff  errors.  Then 


.(k) 

ij 


(^  (k)rt  a^k)/r  4,iC* 

‘y  - "ik  %J  *IJ  «lj 

^k  ^ik 


^0 

and  the  lemma  follows  from  the  bound  u/(l  + u)  on  the  i 
LEMMA  A. 3.  The  matrices  L and  U satisfy 
LU  - A + E 

with 

|||d~^e|z1|  < (3-2"”^  - 3)u)^""^ul|  |d"^a|z|| 
for  arbitrary  z ^ 0, 


for  i,j  > k, 

for  i > j = k, 

otherwise, 
is  straightforward 

1 - 1 be  fixed, 
i > k + 1, 
i,j  ^ k + 1, 

for  i,j  > k, 

for  i > j = k, 

otherwise, 

*s.  Q.E.D. 


( 


PROOF.  Let  E = + E^^^  where  the  E^*^^  are  given 

by  Lemma  A. 2.  Substituting  the  bound  on  of  Lemma  A. 1 into  the  bound 

(k) 

on  the  elements  e . . we  get 
ij 


i-(k)|  -1  |-(k)|  . /o  . 1 \ ~1  |~(k)| 

|e'/|  _<  bi  u|a..  I + (2  + 3u)o)  u|a.  |, 


which,  in  fact,  is  valid  for  all  i,j.  This  implies 

5:  1 3u||  1d~^a^*"^|z|| 

j ^ 

for  arbitrary  z > 0.  From  Lemma  A.l  it  Immediately  follows  that 
II  |d~^A^*^^  I z|l  £ I D ^a1z||;  and  so  we  have 

|||d'^e^'"^|z||  < |d'^a|z||, 

from  which  the  lemma  follows.  Q.E.D. 


LEMMA  A. 4.  We  have 


i > j. 


I-  , ^ 2i-l  _ .,i-«,-l.|-  I 

|u  1 < u)  I |2  I l^l-i  I » 
f=l  ^ 


i 1 j • 


PROOF.  Follows  from  Lemma  A.l  because  = m^^  and  u^^  = a^^  . Q.E.D. 

LEMMA  A. 5.  The  vector  k computed  by  (4.4)  satisfies  (U  + 6U)x  = y 
for  some  upper  triangular  matrix  6U  such  that 

1 g(i,j)cj"“^|u^j|  and  |u^^  + 6u^j  j £ | u^^  | 


where 


g(i.j)  = 


n - j + 2,  j > i + 2, 

n - j + 1,  j = i + 1, 

2,  j = i < n - 1, 

1.  j = i = n. 


regardless  of  the  pivoting  strategy. 


y 
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PROOF.  From  (4.4)  we  have  that 

u.  . R. 

(1+6^)  (1+6 p ■*■  ^•’•^“i,i+l  ^1+1  ^ “l,l+2  ""  ^i+2^"'^  ‘^in  “ ^i’  ^ " 

and 

u X 

nn  n 

1+6’  ° ^n’ 


1, 


where  6.  and  6!  are  relative  roundoff  errors  due  to  subtraction  and  division, 
1 i 

resper  ively.  By  the  usual  type  of  analysis  we  can  obtain  the  bounds 


r 


(u) 


n-j+2 


hui.l  1 J 


(o) 


n-j+1 


- 1)  1 , j ^ i + 2, 

- 1) |u. . I , j = i + 1, 


ij 


(m^  - Dlu^.l, 

(o)  - 1)  |u^j  1 , 


j = i _<  n - 1, 
j = i = n, 


from  which  the  lemma  follows.  Q.E.D. 

LEMMA  A. 6.  The  matrix  6U  of  Lerma  A. 5 satisfies 

II  |d“^L6U|z||  £ f5*2"~^  - 21  u)^"i^|  |d”^a|z|| 

for  arbitrary  z ^ 0. 

PROOF.  From  Lemma  A. 4 it  follows  that 

(|d"^L6U|z)^  < Z Z |d^^  d I |6U|^Jz, 

j k -J 

1 w Z Z |6u  |z  . 
j k J 

Applying  Lemma  A. 5 first  and  then  Lemma  A. 4 gives 

(|d~^L6u|z).  < u Z Z g(k,j)o)""‘'‘^^|G,  Jz 

^ j k J 

n j , k , « , 

_<  u Z Z g(k,j)w  Z [2 

j = l k-1  e=i  J 


2n 


n n j 

< u)“‘  u Z Z Z f2 

11=1  j*il  k»H 


k-e-1 


1 g(k.j) |a^j |zj. 
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After  much  manipulation  it  turns  out  that 


j 


r 1 /I  j \ 1C  I 

max  r f2  1 g(k,j)  = [5*2  J, 


£,<j_<n  k=Ji. 
and  therefore 


(1d~^L6u|z).  < u Z I 5-2"  (|d“-"a1z)  , 


2n 


n-£-2, 


£=1 


from  which  the  lemma  follows.  Q.E.D. 

LEMMA  A. 7.  The  vector  y computed  by  (4.3)  satisfies 
(L  + 6L)y  = b 

for  some  tower  triangutcor  matrix  6L  whose  elements  satisfy 

I £ {n  - j + 1,  n - Dm"  ^u|a^j  | , i i j> 

regardless  of  the  pivoting  strategy. 

PROOF.  We  have  from  (4.3)  that 


= b. 


where  6^  is  a relative  roundoff  error.  By  the  usual  type  of  analysi 


we  get  that 


1 ((1  + 0)  ^u)^“^  - i 2, 

1 ((1  + - l)l£_|,  i > j > 2, 


and 


l«^iil  1 

The  lemma  follows  from  the  inequalities 


(1+0)  ^u)^  - 1 ^ 0)  ^u  k(l  + 0)  ^u)^  ^ 

, k-2 

< ko)  u 


IHjjl  _<  min  (o),  0)^  ^ J I • Q*E.D. 


and 
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LEMMA  A. 8.  The  matriaes  6U  and  6L  of  Lermas  A. 5 and  A. 7 satisfy 
|||d“^6L(U  + 6U)lz|l_<  - n - 3)o)^"i^|  | d”^a|  z 1| 

for  arbitrary  z ^ 0. 

PROOF.  Using  the  bound  of  Lemma  A. 7 and  the  row  pivoting 
inequality  (4.5),  we  get 

^ min{n  - j + 1,  n - l}o)”  ■^u(d^/dj(,  i ^ j- 

Hence 

(|d"^6L(U  + 6U)|z).  u Z minfn  - j + 1,  n - (|d“^(U  + 6U)|z).. 

^ j ^ 

It  immediately  follows  from  Lemma  A. 5 that 

(|d“^(U  + 6U)|z)^  £ o)"“^''’^(1d“^u|z)^ 
and  from  Lemma  A. 4 that 

(|d"^(U  + 6U)|z)j  1 1d'^A|z1|. 

Therefore, 

(|d"^6L(U  + 6U)|z)  m^"u  Z min{n  - j + 1,  n - 1}2^"^|  | d”^a|  z ||, 

j=l 

from  which  the  lemma  follows.  Q.E.D. 

THEOREM  4.1.  Let  the  vector  R be  computed  by  Gaussian  elimination 
with  row  pivoting  and  row  scaling  where  D = diag(d^,  d2»...,  d^)  is  the  matrix 
of  reciprocal  scale  factors.  Then  there  exists  AA  such  that  (A  + AA)R  = b 
with 

II  |Dg,^^A|zl|  £ x(n)u||  |d”^a1z|1 

for  arbitrary  z ^ 0 and  arbitrary  e satisfying  0 < |e|  < e where  x(n)  = 

,,  „n-2  , 2nu 

1 19*  2 - n - 8 1 e 

PROOF.  The  restriction  on  e implies  that  C(n(l>^))  = C(n(D)),  and 


from  Lemmas  A. 3,  A. 6,  and  A. 8 we  have  the  bounds 
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|||d^4|z||  < (3-2"'^  - 3)ai^'"~^,jMDe^Alz||, 

|1|d^^L6uH|  < [5. 2""^  - 21o)^"v^|  |d’^a|z||, 

and 

(((d^^6L(U  + du)(z((j<  (2”''’^  - n - 3)o)^"i;^(  (d^^a(  z (|. 

The  theorem  follows  from  the  equation 

(A  + E + 6LU  + (L  + 6L)6U)X  = b.  Q.E.D. 

THEOREM  4.2.  There  exists  a problem  Ax  = b and  a floating  point 
arithmetic  <+,  *,  T>  such  that  the  solution  5t  computed  by  Gaussian 

elimination  with  partial  or  complete  pivoting  satisfies  (A  + AA)x  = b 
only  for  those  matrices  AA  for  which 


AA 

LU 

a| 

n 

r 

> fl9*2"~^  - n - 81u  + O(u^). 


Therefore,  the  bound  of  Theorem  4.1  is  the  best  possible  bound  in  the 
limit  u ->•  0. 


PROOF.  Obvious  for  n = 1.  Assume  n > 2.  Let 


and 


'l  + (7.2*^  ^ - 
1 + (2"-^^  - n 


J + (19.2 


n-2 


k - 8)u, 

- 7)u, 
n - 8)u, 


k ^ n - 2, 
k = n - 1, 
k = n. 


If  M is  large  enough,  then  there  are  no  Interchanges  even  with  complete 
pivoting.  We  have 

m^j  = m = i > j , 


(k-1) 

in 


m » 


(k-1) 
k-l,n, ’ 


i > k > 2. 


5 


Suppose  that  all  these  floating  point  operations  increase  the  magnitude 
of  the  result  by  a factor  (1  + 2u)/(l  + u).  Then 


m = -(1  + u)  + 0(u  ) 


and 


aj*"^  = (1  + u)  af*^  + (1  + 3u)  + O(u^). 

in  in  k-l,n 

By  induction  on  k it  follows  that 

^(k)  ^ ^k-1  ^ ^ O(u^),  i > k, 

in 


and  hence 


We  have 


where 


Then 


u,  = 2*^“^  + 2^(k-l)u  + O(u^), 
kn 


Yl  = bi 


Sj^  = (...(m  « y^  4-  m * + m « yj^) 


= m 


and 


S,  = S,  , + m « (b,  - S,  J,  2 < k < n - 1. 
k k-1  k k-1  — — 

Suppose  that  all  these  floating  point  operations  reduce  the  magnitude 
of  the  result  by  a factor  1/(1  + u) . Hence 
Si  = -bj  + O(u^) 

and 


Sj^  = (2  - 3u)Sj^_j^  - (1  - 2u)bj^  + 0(u  ),  2 < k £ n - 1. 

By  induction  on  k it  follows  that 

Sj^  = 1 - 2*^  - 2*^‘^^(k  - 4)u  - (k  + 9)u  + O(u^),  k £ n - 2 


and 


S 1=1-  2""^  - 2""^(4n  - 19)u  - (n  + 8)u  + O(u^). 


Also  we  have 


= ? - 2u. 


= (bj^  - - u + 0(u^)),  k ^ 2. 


From  this  we  get 
k 


y.  = 2*^  ^ + 2^(k  - 2)u  + O(u^),  k £ n - 2, 


+ 2"~^(2n  - 5)u  + O(u^), 


y^  = 2""^  + 2"“^(2n  - l)u  + O(u^). 


We  have 


X = y /u  , 
n n nn 


X , = (y  T - u , *51  )/M, 

n-1  n-1  n-l,n  n 


\ = (Vk  ^ (0  + \n  * k < n - 2. 


Suppose  that  all  these  floating  point  operations  reduce  the  magnitude  of 
the  result  by  a factor  1/ (1  + u) . Then 


X = (y  /u  )(1  - u)  = 1 + 0(u"), 
n n nn 


u,  *k=u,  (l-u)=  2"'^  + 2"  ^(2n  - 5)u  + O(u^), 

n-l,n  n n-l,n 


X 1 = 0(u  ), 

n-i 


0 + u,  * X = u,  (1  - 2u)  = 2^  ^ + 2 ^(k  - 2)u  + 0(u  ),  k < 
k,n  n k,n  ~ 


and 


Xj^  = 0(u),  k^n-2. 


Hence 


(b  - Ax)  = b - 1 + O(u^)  = (19-2""^  - n - 8)u  + O(u^). 
n n 


No  matter  how  we  choose  AA  we  get 


II  |AA|  |x|  II  > |IaAx||  = ||b  - Ax||  = (19-2"~^  - n - 8)u  + OCu'^).  Q. 
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Appendix  B.  Error  Bounds  for  Column  Pivoting 


For  any  matrix  C = (c..)  let  c . . = .d. 

13  ij  Ij  j 

LEMMA  B.l.  We  have 


d , Also,  let  u)  =>  1 + u. 


I - I ’ i > k,  j > k, 


i-(k)i  ^ k+1  , 

a . . < 0)  1 (2u)) 


k-l-«,|-  I k-l|- 
la.J+o,  |a.^ 


. i.j  1 k . 


PROOF.  Equation  (4.1)  implies 

I - I • 1 > k.  j 1 k. 


and  because  of  (4.5)  we  get 


|m.,a^^^d.|  < ulaf^^d  j,  i > k,  j > k, 

ik  kj  3 ' — ' IK.  k'  • J _ » 


which  proves  the  first  inequality.  Equation  (4.2)  implies 

1 1 I + (1  + 2u)  i . i,3  i k + 1, 

and  therefore 


|af’^'‘'^^|  <o)|af*^^l  + u(l  + 2u) 

'13  ' - ' 13  ' 


a.,,.  . 


l,j  1 k + 1. 


The  second  inequality  of  the  theorem  follows  from  this  by  induction  on  k.  Q.E.D. 
LEMMA  B.2.  The  matriaes  L and  U satisfy 
LU  = A + E 


with 

|ED|e  £ 1 7*  2”  ^ - n-2J  u^*'u|AD|e. 

PROOF.  Assume  n > 2.  Let  E = E^^^  + E^^^  +. . .+  where 


(k) 

the  E are  given  by  Lemma  A. 2.  Substituting  the  bound  on  m^j^  of  Lemma  B.l 
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This  implies 


1 (2n  - 2k+l)o)u|i^[j^|  + u I 1 > k. 


ij 


j=k+l 


Ij 


and  from  Lemma  B.l  it  follows  that 


k-1 


1 (2n  - 2V.+l)i^^K{  z^ 


£=1 


P. 


^ , , , 2k-l  ,k-l-«,  I - I . k-1  " I - I 

+ (n-k)w  u Z 2 ^ ^ 

1^1  ^ j=k+l  ^ 


’(2n-l)(o  u z|a 


j 


ij 


,(3n  - 3k+l)u)^'^u  2^"^  S|a 


ij' 


if  k - 1, 
if  k > 2. 


The  lemma  follows  because 


H”  1 It  ^ ^ 

(2n-l)  + E (3n  - 3k+l)2  - n-2.  Q.E.D. 

k=2 


LEMMA  B.3.  We  have 

“ijl  < 

E \2 

£=1 


iliiJ. 


i > j. 


and 


l“ij  ' - I’^li'  ’ ^ - J* 


PROOF.  The  first  two  inequalities  follow  immediately  from 
Lemma  B.l,  and  the  third  Inequality  is  a consequence  of  column  pivoting.  Q.E.D. 
LEMMA  B.4.  The  matrix  6U  of  Lemma  A. 5 satisfies 
|l  5U  D|e  £ (2"''’^  - n-2)(o^"  u|AD|e. 

PROOF.  Applying  the  inequality  Lemma  B.3  and 

the  bounds  of  Lemma  A. 5,  we  get  for  i < n 


I g(i,j)u"  1 

j ^ j = i 


. (n-i+2) (n-i+1)  n-i  i-  i 

1 2 “ 


Therefore 


(lL6UD|e).  = Eli  I 5:|6u  | 

j J J 


- j ^ ' ij  Jj' 


and  so  using  Lemma  B.3  gives 

(|L6UD|e).  < u)^"u  Z 1 r2^"^~^l|i  1 

^ j=l  «,=! 

<,2n,  I iBzl-^2)(n-j^l)  ^ 

3=1  ^ 1=1  ^ 

from  which  the  lemma  follows.  Q.E.D. 

LEMMA  B.5.  The  matrices  6U  and  6L  of  Lemmas  A. 5 and  A. 7 satisfy 

|6L(U  + 6U)D|e  < 3 (2"-n-l)w^"u | AD | e . 

PROOF.  Applying  the  inequality  I I £ Lemma  B.3  and 

the  bounds  of  Lemma  A. 5,  we  get 


_<  (n-i+l)u)"~^''’^|u^^|  . 

Therefore  applying  the  bounds  of  Lemma  A. 7 gives 
(|6L(U  + 6U)D|e)^  = Z | | Z | u^j^  + 6u^^\ 

_<  Z min  {n-j+1,  n-l}u)”  ^ | a|j  Vaj j ^ | (n-j+l)o)"  ’ 


and  so  using  Lemma  B.3  gives 
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Appendix  C.  Error  Bounds  for  Iterative  Improvement 


LEMMA  C.l.  The  computed  residual  satisfies 


r = b - Ax  + c 
tn  m tn 


|c  I <u|b-Ax  I +wIa|Ix  I 
' m'  — ' m'  ‘ ' in' 


where 


ra  + 6)  [(1  + fi)"  - 1] 

Ul  + u)  [(1  + u^)" 


for  s.p.  aocum. 


for  d.p.  aocum. 


and  u = u/ (1  + u) . 


PROOF.  The  computed  residual 


r = b - (Ax  + c')  + c" 
m m m m 


where  c'  is  the  error  due  to  computing  Ax  and  c"  is  the  rest  of  the  error, 
m ° m m 


For  the  single  precision  case  we  have 


kj  1 [(1  + u)"  - 1]  1a|  |x 


1 < ul b - Ax  - o'  I 
m'  — ' m m‘ 


so  that 


c I < !c’|  + |c"| 
m'  — ' m'  ' m' 


^ u|b  - Ax^l  + (1  + u)  [(1  + u)  - 1]  |A| |x^| 


For  the  double  precision  case  we  have 


IcJ  1 [(1  + u^)"  - 1]  lAlIx^l 


|c"|<u|b-Ax  -c'l 
' m'  — ' m m' 


so  that 


|c  I < |c» I + |c"| 
' m'  — ' m'  ' m' 


< u|b  - Ax  I + (1  + u)  [(1  + u^)"  - 1]  |a||x J.  Q.E.D. 


1 

i 

J 
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(|6L(U  + 6U)D|e)^  < u S min  {n-j+1,  n-l}(n-j+l)  I 


£ u)  u I min  {n-j+1,  n-l}(n-j+l)  {2^”'^!  S |a  . |, 
j=l  Jl=l 

from  which  the  theorem  follows.  Q.E.D. 

THEOREM  5.1.  Let  the  sector  R be  computed  by  Gaussian  elimination 

with  column  pivoting  and  column  scaling  where  D = diag(dj^,  d2» , d^) 

is  the  matrix  of  scale  factors.  Then  there  exists  A A such  that 

(A  +AA)X  = b 


|AAD|e  ^ x(a)u|AD|e 

where 

X(n)  = 127.2""^  - 5n  - 7Je^""*. 

PROOF.  The  theorem  follows  from  the  bounds  of  Lemmas  B.2,  B.4, 
and  B.5  and  from  the  equality 

(A  + E + 6LU  + (L  + 6L)6U)X  = b.  Q.E.D. 


I 


LEMMA  C.5.  Ve  have 


11m  z 


uv  + w 


m-x” 


m"  - 1 - T 


provided  that  t < 1 where 

T = u + (2v  + uv  + w)y- 
PROOF.  From  Lemma  C.4 


< J.  - .VT 


uv  + w 


' ' nri-1 1-vy"  m 1-vy 
LEMMA  C.6.  We  have 


X . Q.E.D. 


l^n.1  -|l|A|M||e}  lll‘l|x|ll» 


m^ 


provided  that  t < 1. 

PROOF . From  Lemma  C . 4 


li™  1^1  £u  lim  |z  I +«r{|Al|x|  - 1|  | a|  | x|  ]|  e} 


tn-^ 


m 


uv  + w 


+ {(.\  ~ ^ - U)  lim  Hz  II  + - 

1 - VY  " m"  1 - VY 


and  the  lemma  follows  from  Lemma  C.5.  Q.E.D. 

THEOREM  C. 7.  Assume  |a||x|  > 0.  Then 


lim  n < 
m "" 


(1  + UY)  (uv  + w)o  w(o  - 1)  • 

1 - T ~ 1 - U 


1 - U - 


(1  + u)  (uv  -F  v)yc 


1 - T 

provided  that  (1  + u) (uv  + w)yo  < (1  - u)(l  - t)  where  a ■ x) 

PROOF.  From  Lemmas  C.5  and  C.6  we  have 
(uv  + w)g 


lim  ll^n,l|ei  1 ;-^|a||x| 

mxx) 


and 


|2  I < ^_(uv. tjoo  _ 
' m'  — 1 - T 


I - u 


}|a| |x| . 


Hence  using  Lemma  C.3, 


i 
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LEMMA  C.2,  Define  z = A(x  + d - x) . Then 

m mm 

|z^|  < u|a(x^  - x)|  + wIaI jx^l 

+ (1  - vy)“^v  dl  |a|  Ix^  - x]  II  + Yu|  |A(x^  - x)1|  + Yw||  [Aj  lx^|  l|)e 

where  v = x(n)“  and  y = Cond(A  ^). 

PROOF.  The  correction  term  d_  = (A  + F ) ^r  where  F is  the  AA 

m mm  m 

of  Theorem  4.1.  We  have 

z = A(x  - x)  + r - (I  + F A ^ F A ^r 
m m m m mm 

= c - (I  + F a"^)“^  F (x  - X - A“^  c ). 
m m m m m 

It  follows  from  the  bounds  of  Theorem  4.1  that 

\ 1 |c^l  + (1  - vy)"^v  dllA||x^  - x|||  +Y|(c^||)e. 

Substituting  thr  bounds  of  Lemma  C.l  into  this  proves  the  lemma.  Q.E.D. 


LEMMA  C.3.  We  have 


Ia(x  - x)(  < (z  [ + u[a([x[  + uy[(z J|e 


l^m+1  " - “1^1  1*1  + (1  + u)Y||z^I|e- 

PROOF.  The  new  iterate 


X . , = X + d + g 

nri-1  m m m 

where  |g^|  £ u|x  + Equivalently 

*nH-l  “ * ®m 

where  |g^|  £ u|x|  + u|A  , from  which  the  lemma  follows.  Q.E.D. 

LEMMA  C.4.  We  have 

hnH-ll  - “^l^ml  ■ ll^mll®^  *^1^1  1*1  ' H 1^1  1*1 

+ (1  - vy)  ^ { [u  + y(v  + uv  + w)  ] ||z^||  + (uv  + w)||  I a|  lx|  ||}e 

2 

where  Cr  = u + w(l  + u). 


PROOF.  Substituting  the  bounds  of  Lemma  C.3  into  those  of 


Lemma  C.l  proves  the  lemma.  Q.E.D. 


PROOF.  Substituting  the  Inequality  of  Lemma  C.8  into  those 


of  Lemma  C.3  gives 

|a(X2  - X)  I < |a|  |x|  + 1^1 1^1 11^ 

^ ^ (1  - VY)^ 

and 

|a1  U - x|  j<  u|a|  |xl  + t -HV  + y y)  ^ U |^|  |^| 

(1  •-  vy)^ 

and  the  theorem  follows  from  the  inequality 

|a(x2  - x| 

n_  < max  — i i-t-t-i r . O.E.D. 


lA(x  - x)|  < (d  uy)(uv  frr)g 


w(g  - 1) 
1 - u 


+ u}  |a| )x) 


and 


ii;;iA||x_-x|  < {„  + |A||x|. 


m-Ko 


The  theorem  follows  from 

[A(x  - x)| 


n - max 
m 


lA(x^  - x)| 

|a]-[^-"|a|Tx  - x|  • 


m 


LEMMA  C . 8 . Vie  have 


and  thus 


So 


hJ  iw|A||x|  -H 

^ (1  - VY)^ 

PROOF,  The  first  iterate  x^^  = (A  + AA)~^Ax, 


x^  - X = -A~^(I  + AA  A~^)"^AAx. 


||a(Xj^  - x)I|  _<  (1  - vy)  II  |a|  |x 


X e. 


and 


|I|a(  |xj^  - x|  II  < (1  - vy)~^vy  II  |a|  |x|  ||. 
From  Lemma  C.2  we  have 


I 2ll  1 w|aI  |xl 

+ (1  - vy)  ^ { (v  + w)||  |a|  |x^  - x|  (I  + u|(a(Xj^  - x)  II  + vwyII  |a|  [x]  ||}. 
The  lemma  follows  by  substituting  the  previous  two  inequalities  into  this.  Q.E.D. 
THEOREM  C.9.  Assume  that  |a||x|  > 0.  Then 


^ ^ ^ (v  + u)wYa  (u  + VY  -F  wy)(1  + uY)va 

1 - VY  .2 

„^  < ^ 

l - a - (1  + u)YO 

(1  - vy) 


provided  that  the  denominator  is  positive. 
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